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The experiment was designed to assist the NAS (Numerical Aerodynamic Simulation) 
Project Office in the testing and evaluation of long haul communications for remote 
users. The objectives of this work were to: 

(1) Use foreign workstations to remotely access the NAS system. In this way problems associated with 
interfacing equipment commonly used by university and industrial investigators to the NAS facility can 
be identified and solutions to these problems found. 

(2) Provide NAS with a link to a large university-based computing facility which can serve as a model 
for a regional node of the Long-Haul Communications Subsystem (LHCS). 

(3) Provide a tail circuit to the University of Colorado at Boulder thereby simulating the complete com- 
munications path from NAS through a regional node to an end-user. 


To meet these objectives the Institute for Computational Studies (ICS) took delivery 
of a Sun-2 160 color workstation in the first part of June. Another Sun-2 160 color 
workstation arrived in early July. 

The first problem encountered by NAS/ICS was the inability of the Sun workstation 
to run a graphics software package (PLOT3D) available on NASA’s Silicon Graphics 
IRIS workstation. Becky Olson spent a week at NASA/Ames in May learning about 
the graphics package and the IRIS. She spent the next month converting it to the SUN 
workstation so the communication experiment could take place. The modified source 
code for the Sun version of PLOT3D is listed in Appendix I. 

The 56 kilobaud line and the data service unit were installed during the week of July 
29. By August 1, NASA/Ames and ICS were communicating through the 56 kilobaud 
line, providing NAS with a link to a university. 

The next phase of the experiment was to determine if the 56 kilobaud line was ade- 
quate for actual development work and data and graphic file transfers. Julie 
Swisshelm spent the week of August 19 through 23 conducting Computational Fluid 
Dynamics (CFD) runs at NASA/Ames to have a base in determining the time it takes 
to run a CFD problem and transfer files locally at NASA Ames. 

The next week (August 26 through August 30) the CFD runs were done at ICS 
through the 56 kilobaud line with statistical data accumulated to see if the line speed 
was acceptable. This data is being processed by NASA/Ames to evaluate the 
difference between on-site users and remote users. In September, ICS took delivery 
of a Silicon Graphics IRIS workstation. This was added to our local area network and 
used also to access the Cray X-MP/12 and Cray-2 at NAS A/ Ames. 

By October most of the problems associated with the long haul communications ex- 
periment, such as the instability of Vitalink and the response time of Amelia, had 
been fixed and a test production mode was begun. 
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During the next few months, an efficient algorithm (CNS3D) to be used for Navier- 
Stokes simulations of complex aircraft and turbomachinery geometries, was 
developed. The paper describing the results of this algorithm development has been 
accepted for presentation in the Tenth International Conference on Numerical 
Methods in Fluid Dynamics to be held in Beijing, China on June 23-27,1986. Includ- 
ed as Appendix II to this report is the extended abstract for this conference. A code 
to do the embedded multigridding for use in the algorithm was developed on the Sun 
workstations/CSU Cyber 205 for a rectilinear cascade of finite-span, swept blades 
mounted between endwalls. This code was made operational on the NAS system. 
This allowed the code to use the large memory capability so that realistically fine 
meshes for this current turbo machinery test geometry can be generated and will allow 
extensions of the code to include complete aircraft and turbomachinery geometries. 
Other code was developed and old code converted to run on the Cray-2. So far we 
have converted a library of i/o routines that we have found helpful in debugging code 
on both the Cray XMP and the CYBER 205. Unfortunately, all of them use features 
not currently supported by the Cray-2 FORTRAN compiler. In addition to that li- 
brary, a set of routines that interpolate efficiently between grid levels in multiple grid 
problems has been developed, with the Cray-2 as the target machine. That is, we 
have been careful to build into the code several parameters that can be adjusted if the 
long memory access time becomes a problem. A code to do three-dimensional in- 
compressible viscous calculations has also been implemented on the Cray-2. 

The first two objectives of the experiment have been met. We discovered that the 
researchers preferred to do most of the editing on the local workstations and ship the 
code over as opposed to doing it at the remote site. This was due to the familiarity of 
the editor on the local workstation and the response time and convenience of a full 
screen editor. Utilities were created that allowed the researchers to bypass any in- 
teraction with Amelia and move files directly from the local workstation to the Cray-2, 
and from the Cray-2 to our local workstations. These utilities are available from ICS, 
on request. The researchers preferred to use the local workstations and just submit 
files to the Cray-2. However, this preference could switch if a user friendly full screen 
editor were available on the Cray-2. It was discovered that 56 kilobaud was an ade- 
quate line speed for transferring source codes and small to medium size data sets. For 
very large data sets 56 kilobaud can create very long transfer times. Our timings and 
calculations demonstrate a 56 kilobaud connection to Amelia takes about 8 minutes to 
transfer a 3,055,850 byte file from our SUN to Amelia using the command ’rep’. The 
command ’ftp’ is much slower, taking 11 minutes to transfer the same size file. Using 
rep and the ’to2’ command to transfer that same file from our SUN to the Cray-2 
takes 11 minutes. Finally, the researchers preferred to generate the graphics files on 
the workstations as opposed to generating them on the remote machine. This was due 
in part to the size of a meta file that would have to be shipped back and also the in- 
teractive capability of changing the characteristics of the graphics file at the local 
workstation. 
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The third objective, that of providing a tail circuit to Boulder, has not yet been accom- 
plished. Completion of this circuit has been delayed by slippage in the schedules for 
LAN interconnections at both CU Boulder and CSU and by slippage in the schedule 
for the reconfiguration of the Colorado Advanced Technology Institute (CATI) fund- 
ed 56Kb link between the CU Boulder and CSU computing centers. To date, we have 
no indication that these delays are anything more than normal schedule slippage. 
Work is continuing, beyond the period of performance of the present grant, and 
should be complete by mid-summer. 

Recommendations 

A couple of items would enhance the convenience of the current configuration. First, 
it would be helpful at times to be able to submit batch jobs to the Cray-2 from the lo- 
cal workstations, and have the job output automatically returned to the local worksta- 
tion or LAN, without the user having to log into any other machine. This would be 
particularly helpful when doing production runs, where there is no real need to be in- 
teractively logged into the Cray-2. We developed this capability for the NAS Project 
Cray X-MP 12, and found it to be a very valuable tool. Incidentally, such a situation 
would also relieve the problem of line contention on the Cray-2 which, though not 
critical now, may become so in time. On the workstation side, a job queuing mechan- 
ism would be nice, so that jobs could be submitted to the Cray-2, and eventually run, 
even when one or more of the links in the path to the machine are temporarily down. 

Second, we realize that the software on the Cray-2 is as yet unfinished, but we feel 
that in time the machine environment ought to be made entirely compatible with that 
of other Cray models. The need for total portability of FORTRAN code and a Cray 
compatible version of Update are noteworthy. There are products on the market, 
such as Historian from Opcode Inc., that emulate and extend the features of Update. 
Perhaps the NAS project could develop a version of Update, or at least an Update 
subset that would handle Update source files from other Cray environments. 
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/* 

d i ssp I a I ooka like I i brary 

*/ 

f include <usercore.h> 

# include <sun/fbio.h> 
f include "disspla.K" 

/* */ 

sun2 () 

/* " 

initialize sun core and screen 

*/ 


/* 


set_up_core2 () ; 


set up random generator for segment numbers 
*/ 

srand(l) ; 

/* 

set up color table 
♦/ 

define color i ndices(&vsurf , 0, MAPSIZE, red, green, blue); 

} 

/* */ 

sun3 () 

/* 

initialize sun core and screen 

•/ 


i nt i ; 

extern TWODIM; 
set up core3(); 

TW0TJlM“= FALSE; 

/* 

set up random generator for segment numbers 
*/ 

srand(l) ; 

/* 

set up color table 
*/ 

define color_indices(£vsurf , 0, MAPSIZE, 

> 

/* 

set up core2() 

{ " “ 

extern struct vwsurf vsurf; 


red, 

-•/ 


green, 


b I ue) ; 


/* 

•/ 


get_view__surface(Avsurf ,NULL) ; 
vsurf .cmapsize = MAPSIZE; 
vsurf .cmapnaroe[0] = ’\0’; 

pump to full capability 

if (initial ize_core(OYNAMICC, SYNCHRONOUS, TWOD)) exit(O); 
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if (initialize view surface (ftvsurf, FALSE)) exit(O); 

/* 

set ndc to default 

*/ 

set ndc space 2(1.0,0.75); 
initialize devices 

*/ 

i n i tdev () ; 

/* . . 

set clipping capability 

*/ 

set_w i ndow_c lipping (TRUE) ; 
set_output_c lipping (TRUE) ; 
select view surface (Avsurf) ; 

/* 

set up transformation types 

•/ 

set image transformation type (XF0RM2) ; 

> 

/* */ 

set up core3() 

< " " 

extern struct vwsurf vsurf; 

get_v i ew_surf ace (Avsurf , NULL) ; 
vsurf .cmapsize = MAPSIZE; 
vsurf .cmapname[0] = ’\0’; 

/* 

pump to full capability 

*/ 

if (initial ize_core(DYNAMICC, SYNCHRONOUS, THREED)) exit(O); 
if (initialize view surface(&vsurf , FALSE)) exit(O); 

/*. 

initialize devices 

*/ 

i n i tdev () ; 

/* 

set clipping capability 

*/ 

set_w i ndow__c lipping (TRUE) ; 
set_output_c lipping (TRUE) ; 
select view surface (&vsurf) ; 

/* 

set transformation capability 

*/ 

set image transformation type(XF0RM3) ; 

} • ” 

/* */ 

ini tdev () 

< 


initialize input devices 

*/ 

initial ize_device( BUTTON, 1) ; 


/* initialize input devices */ 
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initial ize_device( BUTTON, 2); /* initialize input devices */ 

initial ize_device( BUTTON, 3); /* initialize input devices */ 

initial ize_device( LOCATOR, 1); 

initial ize_device( PICK, 1); 

set_echo_posi tion( LOCATOR, 1, 0., 0.); 

set_echo_surface( LOCATOR, l,&vsurf); 

set_echo surface ( PICK, 1, fcvsurf); 

set_j)ickTl, 0.001); 


graf 3d (x3m i n , x3stp , x3max , y 3m i n , y 3stp , y 3ma x , z3m i n , z3s tp , z3max) 
/* 

3d window set, no axis drawn on 3d 

*/ 

float *x3min,*x3stp,*x3max; 
float *y3mi n, *y3stp, *y3max; 
float *z3min,*z3stp,*z3max; 

{ 

float xndccharl, xndcchar2, yndccharl, yndcchar2; 
float xwldcharl, xwldchar2, ywldcharl, ywldchar2; 
float zndccharl, zndcchar2; 
float zwldcharl, zwldchar2; 
float xndc, yndc, xct, yet; 


xmin = *x3min; 
xstp = *x3stp; 
xmax = *x3max; 
ymin = *y3min; 
ystp = *y3stp; 
ymax = *y3max; 
zmin = *z3min; 
zstp = *z3stp; 
zmax = *z3max; 
zndccharl =0.; 
zndcchar2 =0.; 

set_viewport_3(.0,l. , .0, .75, .0, .75) ; 
xct = (xmax - xmin) / 10.; 
yet = (ymax - ymin) / 10.; 

set_wi ndow(xmi n-xct,xmax+xct,ymi n,ymax+yct) ; 

xndccharl = .2; 

yndccharl = .2; 

xndcchar2 = .27; 

yndcchar2 = .27; 

map__ndc__to_world 3 (xndccharl, yndccharl .zndccharl, 
~ &xwl^charl,&ywldcharl,&zwldcharl) ; 
map_ndc__to_world 3(xndcchar2,yndcchar2,zndcchar2, 
&xwl?char2,&ywldchar2,&zwldchar2) ; 
charwiddef = xwldchar2 - xwldcharl; 
charheidef = ywldchar2 - ywldcharl; 
if (charwiddef < 0) 

charwiddef = -1 * charwiddef; 
if (charheidef < 0) 

charheidef = -1 * charheidef; ' 
charheinew = charheidef; 
charwidnew = charwiddef; 
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/* 

set type of character ie font and set height and width for 
labels 

*/ 

set_charprec i s i on (CHARACTER) ; 
set_font (STICK) ; 

set charsize(charwiddef ,charheidef) ; 

> 

/* */ 

ftoa (r, whole) 

/* " 

change from a real number to character 

*/ 

float r; 
char whole [20]; 


float rem, fnum; 
int num, neg, cnt; 
int i; 
char c; 
char str[20]; 

if (r < 0.) { 
neg = TRUE; 
r *= -1 . ; 

> 

else 

neg = FALSE; 
cnt =0; 
num = i = r; 
fnum = i ; 
rem = r - fnum; 
if (rem >0.) 

str[cnt++] = *.*; 
while ( rem > 0.) { 
rem *= 10. ; 
i = rem; 
i toa ( i , &c) ; 
str[cnt++] = c; 
fnum = i ; 
rem -= fnum; 

> 

str[cnt] = ’VO’; 
if (neg) { 

who I e [0] = »-»; 
i toa (num , ftwho I e [1] ) ; 

> 

else { 

who I e [0] = ’ ’; 
itoa (num,Awhole[l]) ; 

> 

strcat (whole, str) ; 

*/ 


> 

/* 
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itoa (n,sti) 

I* 

change integer to a string 

*/ 

char sti [] ; 
i nt n ; 


int i, sign; 

if ((sign = n) < 0 ) 
n = n; 

i = 0; 
do { 

sti [i++] = n % 10 ♦ ’O’; 

} while ( (n /= 10) > 0) ; 
if (sign < 0) 
sti [i++] = 
sti [i] = * \0 * ; 
reverse (sti) ; 

> 

/* */ 

reverse (srev) 

/* 

reverse the order of a string 

*/ 

char srev[] ; 


int c, i, j; 

for (i = 0, j = strlen(srev) -1; i < j; i++, j— ) { 
c = srev [i] ; 
srev[i] s srev[j]; 
srev[j] = c; 

> 

} 


messag_(str i ng , xpos , y pos) 

/* 

put a text string on screen starting at xpos, ypos 

*/ 

char str ing[] ; 
float *xpos, *ypos; 


float x, y; 

x = *xpos; 
y = *ypos; 

set_charsize(charwiddef ,charheidef) ; 
move abs 2(x,y); 
textXstrTng) ; 
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> 

/* */ 

headin_(title,hei ,num) 

/* 

write a title or label, letting the mouse 
prompt for the position 

*/ 

char title[] ; 
int *num; 
float *hei ; 

{ 

float xpos, ypos; 
float xndc, yndc; 
float xs,xe; 
float heig; 
int butnum; 
heig = *hei ; 

/* 

set character type to character do can be manipulated ie 
rotated etc 

*/ 

set charprec ision (CHARACTER) ; 

set_f ont (STICK) ; 

charwidnew = charwiddef * heig; 

charheinew = charheidef * heig; 

set charsize(charwidnew. charheinew) ; 

1**1 " 

butnum =0; 

I* 

wait for mouse click to determine location 

*/ 

set_echo (LOCATOR , 1 , 1) ; 

printf(" click a mouse button at the position the title should begin\n"); 
do { 

await any__button__get locator_2(10000000,l,&butnum,&xndc,&yndc) ; 

} while ^butnum < 1 | | Futnum > 3) ; 

/* 

wr i te text 

*/ 

map_ndc__to world_2(xndc,yndc,&xpos J &ypos) ; 
move abs 2"(xpos,ypos) ; 
text(ti tTe) ; 

/* 

reset character size 

*/ 

set_charsi ze(charwi ddef ,charhei def ) ; 

/‘ */ 

x3name (title, ten) 

/* 

print x axis label 

*/ 


char title[] ; 
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int *len; 

{ 

float xpos, ypos; 
float xndc, yndc; 
float xs,xe; 
int titlen; 
int butnum; 


titlen = *len; 

/* 

set charsize 

*/ 

set_charsize(charwidnew,charhei new) ; 
charwidnew = charwiddef; 
charheinew = charheidef; 

/* 

create segment for x axis label 

*1 

segnum = rand(); 

create_reta i ned_segment (segnum) ; 
butnum = 0; 

/* 

wait for mouse button for location 

*/ 

set_echo (LOCATOR, 1,1); 

printf(" click a mouse button at the position the x label should begin\n"); 
do { 

await any_button_get I oca tor__2 (10000000, 1, ftbutnum, &xndc, ftyndc) ; 

} while ^butnum < 1 || Tiutnum >3); 

h 

write out text 

*/ 

map_ndc__to wor ld_2(xndc,yndc,&xpos,&ypos) ; 
move abs 2*(xpos,ypos) ; 
text^ti tTe) ; 

close retained segmentQ; 

set cKarsize(cT\arwiddef , charheidef) ; 

> 

/* — */ 

y3name_(ti tie, len) 

/* 

print y axis label 

*/ 

char title[]; 
int *len; 

{ 

float xpos, ypos; 
float xndc, yndc; 
float xs,xe; 
int titlen; 
int butnum; 
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titlen = *len; 

set_cha rs i ze (cha r he i new , cha rw i dnew) ; 
charwidnew = charwiddef; 
charheinew = charheidef; 

/* 

create segment for y axis 

*/ 

segnum = rand(); 

create retained segment (segnum) ; 

/♦ 

make sure it is going up y axis 

•/ 

set_charup_2(-l . ,0.) ; 
set_ cha rpa th__2 (0 . , 1 . ) ; 
butnum = 0; 

I* 

wait for mouse button for location 

*/ 

set_echo (LOCATOR, 1,1) ; 

printf(" click a mouse button at the position the y label should begin\n"); 
do { 

await any_button_get I ocator_2 (10000000 , l,&butnum,ftxndc, ty ndc) ; 

} while 7b u t num < 1 | | ¥utnum > 3); 

/* 

write out text 

*/ 

map_ndc_to wor ld_2(xndc,yndc,&xpos,&ypos) ; 
move abs 2Xxpos,ypos) ; 
text^ti tTe) ; 

/* 

revert character direction back to normal 

*/ 

set_charup_2 (0 . , 1 . ) ; 
set_charpath_2(l . ,0.) ; 
close retained segment (); 
set cKarsize(c¥arwiddef , charheidef) ; 

> 

/* */ 

z3name (title, len) 

print y axis label 

*/ 

char ti tle[] ; 
int *len; 

{ 

float xpos, ypos; 
float xndc, yndc; 
float xs,xe; 
int titlen; 
int butnum; 


titlen = *len; 
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set_ chars ize(charhefnew,charwidnew) ; 
charwidnew = charwiddef; 
charheinew = charheidef; 

h 

create segment for y axis 

*/ 

segnum = randQ; 

create retained segment (segnum) ; 

/♦ 

make sure it is going up y axis 

*/ 

set_charup_2(-l . ,0.) ; 
set_charpath__2 (0 . , 1 . ) ; 
butnum = 0; 

/* 

wait for mouse button for location 

*/ 

set_echo (LOCATOR, 1,1); 

printf(" click a mouse button at the position the y label should begin\n"); 
do { 

await any__button_get locator_2(10000000,l,&butnum,&xndc,&yndc) ; 

} while ~(butnum < 1 || I>utnum > 3); 

/* 

write out text 

*1 

map_ndc_to wor ld_2(xndc,yndc,&xpos,&ypos) ; 
move abs 2"(xpos,ypos) ; 
textftitTe) ; 

/* 

revert character direction back to normal 

*/ 

set_cha rup_2 (0 . , 1 . ) ; 

set_charpath_2(l. ,0.) ; 

c I ose reta i ned segment () ; 

set cKa rs i ze (cKa r w i ddef , cha r he i def ) ; 

> 


curve (xaray, yaray, npnts, imark) 

l* 

actually plots the data, ie xaray by yaray 
*/ 

float *xaray, *yaray; 
int * imark, *npnts; 

{ 


f I oat xva I [NUMELEMENTS] , y va I [NUMELEMENTS] ; 
int num, mark, posmark; 
int sym; 
int i, j; 

num = *npnts; 
mark = * imark; 

set_w i ndow_c lipping (TRUE) ; 
set output clipping (TRUE); 


posmark = mark; 
if (mark < 0 ) 

posmark *= -1; 

for (i =0; i < num+1; i++) { 
xval [i] = *xaray++; 
yval[i] = *yaray++; 


segnum = rand(); 

create_reta i ned segment (segnum) ; 
if (mark != 0) 

if (ISETSYMBOL) { 

i nqu i re_marker_symbo I (sym) ; 
sym ♦= 1; 
if (sym > 18) 
sym = 1; 

j = marksym[sym-l] ; 
set marker symbol(j); 

> 

for (i s 0; i < num+1; i ♦= posmark) 
marker abs 2 (xval [i] ,yval [il) ; 
SETSYMBOL = FALSE; 

> 

move abs_2(xval [0] ,yval [0]) ; 
if (mark >= 0) 

po I y I i ne_abs_2 (xva I , y va I , num) ; 
c I ose reta i ned segment () ; 

> 

/* */ 

Iine3 (npnts, xaray, yaray, zaray) 

I* 

actually plots the data, ie xaray by yaray 
*/ 

float *xaray, *yaray, *zaray; 
int *npnts; 

{ 


f I oat xva I [NUMELEMENTS] , y va I [NUMELEMENTS] , z va I [NUMELEMENTS] ; 
int num; 
int sym; 

•nt i, j; 

num = * npnts; 

for (i =0; i < num+1; i++) { 
xval [i] = *xaray++; 
yval[i] = *yaray++; 
zval[i] = *zaray++; 

> 
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Iine2 (npnts, xaray, yaray) 

/* 

actually plots the data, ie xaray by yaray 
*1 

float <*xaray, *yaray; 
int *npnts; 

{ 

f I oat xva I [NUMELEMENTS] , y va I [NUMELEMENTS] ; 
int num; 

Int sym; 
int i, j; 

num = *npnts; 

for (i a 0; i < num+1; i++) { 
xva I [i] = *xaray++; 
yval [i] = *yaray++; 

> 

move abs__2(xval [0] ,yval [0]) ; 
polyTine abs 2 (xva I ,yval ,num) ; 

> " 

/* 

rlvec (xpntl, ypntl, xpnt2, ypnt2, ivec) 

/* 

actual ly plots a line 
*1 

float *xpntl, *ypntl; 
float *xpnt2, *ypnt2; 
int *ivec; 


float xfrom, yfrom, xto, yto; 
int num, mark, posmark; 
int isym; 


xfrom = *xpntl; 
yfrom = *ypntl; 
xto = *xpnt2; 
yto = *ypnt2; 

segnum a randQ; 

create reta i ned_segment (segnum) ; 
move_al>s_2 (xfrom, yfrom) ; 

I i ne__abs_2 (xto , yto) ; 
c I ose reta i ned segment () ; 

* - — — / 

vecto3 (xpntl, ypntl, zpntl, xpnt2, ypnt2, zpnt2) 

I* 

actually plots a 3-d line 
*/ 
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float *xpntl, *ypntl, *zpntl; 
float *xpnt2, *ypnt2, *zpnt2; 

{ 

float xfrom, yfrom, zfrom, xto, yto, zto; 
int arrsym; 


xfrom = *xpntl; 
yfrom » *ypntl; 
zfrom = *zpntl; 
xto = +xpnt2; 
yto = *ypnt2; 
zto = *zpnt2; 

arrsym = 62; 

mo ve_abs_3 (xf r om, y f rom , zfrom) ; 
line abs 3 (xto, yto, zto) ; 

> " " 

/* */ 

object (array) 

/* " 

open a segment 

*/ 

char *array; 

{ 

segnum = rand(); 

create_reta i ned_segment (segnum) ; 

} 

/* */ 

endobj () 

/* 

close a segment 

*/ 

{ 

popatt_() ; 

c I ose_reta i ned_segment () ; 

> 

/* */ 

strtpt (xpntl, ypntl) 

/* " 
p I ots a point 

*1 

float *xpntl, *ypntl; 

{ 


float xfrom, yfrom; 
xfrom = * xpntl; 
yfrom = *ypntl; 
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move abs 2(xf rom,yf rom) ; 

> ~ " 

/* */ 

strtpt3 (xpntl, ypntl, zpntl) 

/* 

plots a point 
*/ 

float *xpntl, *ypntl, *zpntl; 

{ 

float xfrom, yfrom, zfrom; 
xfrom = *xpntl; 
yfrom = *ypntl; 
zfrom = *zpntl; 

move abs 3(xf rom, yfrom, zfrom) ; 

> " “ 

/* */ 

connpt (xpntl, ypntl) 

/* 

actually plots a line 
*/ 

float *xpntl, *ypntl; 

{ 

float xfrom, yfrom; 
xfrom = *xpntl; 
yfrom a *ypntl; 

line abs 2(xf rom, yfrom) ; 

} ~ “ 

/* “*/ 

connpt3 (xpntl, ypntl, zpntl) 

/* 

actually plots a line 
*1 

float *xpntl, *ypntl, *zpntl; 

{ 

float xfrom, yfrom, zfrom; 
xfrom = *xpntl; 
yfrom a *ypntl; 
zfrom = *zpntl; 

line abs 3(xf rom, yfrom, zfrom) ; 

> ~ " 

/• */ 

marker_(isym) 

/* 

define a marker symbol 

*/ 

int *isym; 



i nt sym, j ; 
sym = *isym; 
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SETSYMBOL = TRUE; 
j = sym - 1; 

set marker symbol (marksym[j]) ; 

> “ 

/* */ 

endpl (ip lot) 

/* 

end a plot nothing to do 

*/ 

int *iplot; 

{ 

/* color (1); 
clear (J; */ 

> 

/* */ 

donp 1 1 () 

/*' 

total ly finished plotting 

*1 

{ 

extern struct vwsurf vsurf; 

deselect_view_surface(&vsurf) ; 
terminate coreQ; 

> 


gethei () 

/* 

sets character size of text, default set in setup at .014 
of a inch 

*/ 

{ 

float hite2; 
hite2 = charheidef; 
return *((int*)&hi te2) ; 

} 

/* */ 

savescreen (f i lenam, len) 

/* 

save a screen in a file 

*/ 

char f i lenam[] ; 
int *len; 

{ 

struct { 

int width, height, depth; 
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short *bits; 

} raster; 

struct { 

int type; 
int nbytes; 
char *data; 

> map; 

char fi let it [20] ; 
int rasfid; 
int replicate; 
int strlen; 
i nt i ; 

float wx, wy; 

I* 

save color map 

*/ 

/* co I map [0] = Ared[0]; 
co I map [1] = Agreen[0]; 
co I map [2] = Ablue[0]; 

*/' 

strlen = *len; 

for (i =0; i <= strlen; i++) 
f i I et i t [ i ] =filenam[i]; 
filetitCi] « »\0»j . 
rep I icate = 2; 

h 

get starting location for save 

*/ 

map_ndc_to world 2( .0, .0, Awx, Awy) ; 
move abs 27 Wx, wy) ; 

/* “ ” 

make sure memory is allocated 

*/ 

size_raster(&vsurf , .0,1. , .0, .75,&raster) ; 
a 1 1 ocate raster (Araster) ; 
if (raster. bits = NULL) { 

printf ("failed to allocate raster\n") ; 

> 

else { 

get raster (Avsurf , .0,1. , .0, .75,0,0,Araster) ; 

/* 

save it 

*/ 

if( (rasfid = open( filetit, 1)) = -1) { /+ open the disk file */ ri 

> 

if (rasfid != -1) •{ 

map. type = 1; map. nbytes s 0; 

raster to_file( Araster, Amap, rasfid, replicate); 
close (“rasfid) ; 

free raster (Araster) ; 

> 


/* 

color (index) 

/* 

sets the color and fill and text 

*1 

int * index; 

{ 

int ind; 

ind = * index; 
set_l i ne_i ndex(i nd) ; 
set_f i I l_i ndex(i nd) ; 
set text i ndex ( i nd) ; 

> 

/* 

settex ( i ndex) 

/* 

sets the color of the text 

*/ 

int * index; 

{ 

int ind; 

ind = * index; 

set text index (ind); 

> 

/* 

polfil (index) 

/* 

sets the color of the fill area 

*/ 

int * index; 

{ 

int ind; 

ind = * index; 

set f i 1 1 i ndex (ind) ; 

> 

/* 

clear () 

/* 

clear screen 

*/ 

{ 

de I ete_a 1 1 reta i ned_segments () ; 
new_frame(J; 

> 

/* 

I insty_(sty le) 

/* 

set I i nesty I e for the line i e 
style (dot dash etc ) 

*/ 
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int *style; 

{ 

int I i nasty; 

I i nasty = *style; 

I i nasty = I i nasty - 1 ; 

if ( I i nasty > 5) 

I i nasty = 0 . ; 

set_l i nasty I a ( I i nasty) ; 


I i nasty I a = I i nasty ; 

> 

/* */ 

I inwid_(thick) 


/* 

set width for the line ie 

*/ 

float *thick; 

{ 

float thickness; 
int width; 

thickness = *thick; 
if (thickness <= 1.) 
width a 0; 

else 

width = 1; 

set I i new i dth (w i dth) ; 

> 

/* 7 */ 

I i neatt_(sty I e, th i ck) 

/* 

set attributes for the line ie 
style (dot dash etc and the thickness) 

*/ 

float *style; 
int *thick; 

{ 

int thickness; 
float linestyle; 

I inesty la = *sty Ie; 
thickness = *thick; 

if ( I i nesty I e > 3) 

I i nesty I a = 0 . ; 

se t_ I i nesty I a ( I i nesty I e) ; 
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set I i new idth (thickness) ; 

> 

/* */ 

setco I (redva I , grnva I , b I uva I , I oc) 

/* “ 

sets color in rgb mode 

*/ 

float *redval, *grnval, *bluval; 
int *loc; 

{ 

floatredcol, grncol, blucol; 
int ict; 

redcol = *redval; 
grncol - *grnval; 
blucol = *bluval ; 

ict s *loc; 

red [ i ct] = redco I ; 
green [ict] = grncol; 
blue[ict] = blucol ; 

> 

/* */ 

rgbcol_(r,g,b) 

/* 

set a color and then change to it 

*/ 

float *r, *g, *b; 

{ 


red [255] = *r; 
green [255] = *g; 
blue [255] s *b; 

define color indices(&vsurf , 0, MAPSIZE , red, green, blue); 
> " " 

h */ 


getrgb_( i co I , r va I , grnva I , b I uva I ) 

/* 

get rgb values for index in color map 

*1 

float rval, grnva I, bluval; 
int *icol ; 

{ 

int index; 

index = *icol ; 

rval = red [index]; 
grnva I s green [ i ndex] ; 
bluval s blue [index]; 

> 
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/* : */ 

polsur (proper, i proper) 

/* 

set shading properties 

*/ 

float *proper[]; 
int *i proper [] ; 


{ 

float ambient, diffuse, secular, flood, bump; 
int hue, style; 


ambient = *proper[0]; 
diffuse = *proper[l]; 
secular = *proper[2]; 
flood = *proper[3]; 
bump = ^proper [4] ; 
hue = 1; 
sty I e = * i proper [1] ; 

> 

/* */ 


poly2 (num,xarr,yarr) 

/* ” 

draw a polygon 

*/ 

float *xarr[], *yarr[]; 
int *num; 

{ 

float x[], y [] ; 
int numb; 

x[0] = *xarr[0]; 
y[0] = wyarr [0] ; 

numb = *num; 

polygon abs 2(x,y,numb); 

> ~ " 

/* */ 

polf (num,xarr,yarr,zarr) 

/* 

draw a polygon 

*/ 

float *xarr[] , *yarr[], *zarr[]; 
int *num; 

{ 

float x[], y [] , z[J; 
int numb; 

x[0] = *xarr[0]; 
y [0] = *yarr [0] ; 
z[0] = *zarr[0]; 


numb = *num; 
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polygon abs 3(x,y,z,numb) ; 

} ~ ” 

/* */ 

mapcol (index, rdval ,grnval ,bluval) 

I* ~ 

create color map 

*/. 

int * index; 

float *rdval ,*grnval , tbluval ; 

{ 

float redrgb, grnrgb, blurgb; 
int count; 

count = * index; 

red[count]= *rdval; 
blue [count] = *bluval; 
green [count] = *grnval; 


vecto2 (xstart,ystart,xend,yend) 

/* 

draw a 2 d vector 

*1 

float *xstart, *ystart, *xend, *yend; 

{ 

float xs, ys, xe, ye; 

xs = *xstart; 
ys = *ystart; 
xe = *xend; 
ye = *yend; 

move_abs_2(xs,ys) ; 

I ine abs 2(xe,ye) ; 

> “ “ 

/* ♦/ 

point3 (x,y,z) 

/* 

make a point at the given coordinates 

*/ 

float *x, *y , *z; 

{ 

float xpos, ypos, zpos; 
int dot; 

dot =46; 
xpos = *x; 
ypos = *y; 
zpos = *z; 

set marker_symbol (dot) ; 
marker abs~*3 (xpos, ypos, zpos) ; 

> “ “ 
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point2_(x,y) 

/* 

make a point at the given coordinates 

*/ 

float *x, *y; 

{ 

float xpos, ypos; 
int dot; 

dot = 46; 
xpos = *x; 
ypos = *y; 

set marker_symbol (dot) ; 
marker abs 2 (xpos, ypos) ; 

pushma () 

{ 

} 

popatt () 

{ 

> 

tmpf i I e () 

{ 

} 

popmat () 

{ 

> 

/* */ 

height (num) 

/* 

change height of numbers 

*/ 

float *num; 

{ 

float multiple; 
multiple = *num; 

charheidef = charheidef * multiple; 
charwiddef = charwiddef * multiple; 

> 


Appendix II 
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SUMMARY 

An efficient algorithm designed to be used for Navier-Stokes simulations of com- 
plex flows over complete configurations is presented and evaluated. The algorithm in- 
corporates a number of elements, including an explicit three-dimensional flow solver, 
embedded mesh refinements, a model equation hierarchy ranging from the Euler 
equations through the full Navier-Stokes equations, multiple-grid convergence ac- 
celeration and extensive vectorization and multitasking for efficient execution on 
parallel-processing supercomputers. Results are presented for a problem representa- 
tive of turbo machinery applications. Based on the performance data available at this 
writing, it is expected that the final version of this paper will report overall speedups 
ranging as high as 100. 


INTRODUCTION 


Numerical flow simulation is becoming indispensible in aerodynamic design. 
Because of the large economic benefit resultant from the intelligent use of com- 
putational aerodynamics, significant resources are being committed to its appli- 
cation to component design and integration. A recent survey [1] of the work 
that led to the Boeing 757 and 767 aircraft reveals that computational methods 
contributed to the design of nearly every aerodynamic component. 

The successes attained thus far have raised expectations and established the 
numerical simulation of complex flows over complete configurations as the next 
objective. Such capability will permit the design of aerodynamic devices as 
entire entities, rather than as individual components with their interactions 
taken into account only through a-poateriori modification and integration tech- 
niques, as is current practice. This latest goal of computational aerodynamics is 
being actively pursued. In fact, a first attempt at solving the Navier-Stokes 
equations for the flowfield around a complete aircraft has already been reported 
[ 2 ]- 


It is generally recognized that a comprehensive approach to the simulation 
of flows involving both complex geometries and complex physics will require 
powerful advanced-architecture supercomputers with very large memories. 
Machines capable of producing solutions to Reynolds-averaged Navier-Stokes 
flows over complex geometries within computing times short enough to be of 
design interest are expected to be available by the end of this decade [3]. In 
order to use these parallel-processing supercomputers effectively, algorithms 
must be adapted to focus the power of multiple processing units on a single 
flow simulation. Furthermore, the history of computational aerodynamics 
teaches that the pace of progress in this field is set by the synergism between 
improved computers and better algorithms. In the past 15 years, improved 
computers have reduced the cost of computation by a factor of about 100. 
Over the same period, better algorithms have reduced the cost of computation 
on a given computer by a factor of almost 1000 [4] . Thus, it is to be expected 
that the need for faster algorithms will not be diminished by the availability of 
faster and larger computers. 
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The purpose of the present work is to contribute to the development of 
algorithms appropriate for the simulation of complex flows over complete 
configurations. Such algorithms must be efficient and must map readily onto 
the architectures of parallel-processing supercomputers. The approach selected 
enhances the efficiency of a robust and flexible solution procedure by imple- 
menting it on a collection of local meshes embedded in a global mesh. Either 
the Euler, thin-layer Navier-Stokes or full Navier-Stokes equations are solved 
on each mesh. The choice of model equations is determined by the nature of 
the flow physics to be resolved on a particular mesh. When the requirement 
for time accuracy is relaxed, a convergence acceleration procedure is applied 
simultaneously to all meshes and all model equations. The entire algorithm is 
explicit and is designed to perform well on computers consisting of multiple 
processing units, each having vector processing capability. Examples of such 
machines are the Cray X-MP and Cray 2. 

Three-dimensional Navier-Stokes simulations using implicit methods have 
been reported recently by several investigators. These include: Aki and 
Yamada [5], Deiwert, et al. [6], Flores [7], Fujii and Obayashi [8], Holst, et al. 
[9], Hung [10], Kordulla [ll] and Li [12]. Recent 3-D Navier-Stokes simula- 
tions using explicit methods include the work of: Johnson and Swisshelm [13], 
Mikartarian, et al. [14], Roger, et al. [15] and Shang and Scherr [2]. Interest- 
ing 3-D Euler calculations include publications by Koeck and Chattot [16] and 
Rizzi and Purcell [17]. Some form of three-dimensional zonal gridding or grid 
embedding has been emphasized in [7] , [9] and by Benek, et al. [18]. Interest- 
ing work on grid embedding in two dimensions has been carried out by Usab 
[19] and Eberhardt and Baganoff [20]. Convergence acceleration has been 
stressed in [7], [13], [16] and [19]. The implementation of Navier-Stokes algo- 
rithms on parallel-processing supercomputers has been discussed by Johnson, 
et al. [21] and Stevens [22]. 

It thus appears that the time is ripe for the introduction of a solution 
methodology for complex, three-dimensional viscous flows which embodies the 
following elements: a robust basic flow solver, embedded meshes, a hierarchy 
of physical flow models, convergence acceleration and multitasking for efficient 
execution on parallel-processing supercomputers. Such a methodology is 
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described in this paper. Computational results are presented for a three 
dimensional flow problem related to turbo machinery applications. 


EQUATIONS OF MOTION 


The nondimensional Reynolds-averaged Navier-Stokes equations may be 
. written in conservation-law form as 


q t - ~(F X +G y +H z ) 


( 1 ) 


where, for the full Navier-Stokes equations, 

F = / — Re _1 p G = g — Re -1 r H — h — Re -1 s 


while, for their thin-layer version, 

F = f G = g H = h - Rc~ l d 

and, for the Euler equations, 

F = f G = g H = h 

where: 
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r M = H u x + v y + w z) + 2 P u x 


fix = 'TK'Pr l e x + UT XZ + vT xy + u > t Z2 


T yy “ + v y + w z) + 2 P v y fi y = l^Pr 1 e y + ur yx + vr yy + wr yz 

= A(u 2 + v y + w z ) + 2fiw z fi z = 7 «Pr _1 e* + u + vr zy + wt u 

T xy = T yx = A*K + V *) 

T xz ~ T a ~ P ( u z ~ >t ~ w z) 

T yz ~ r zy ~ P( v z + «V) 

Here p, u, v, w, p and E are respectively density, velocity components in the 
x-, y- and z-directions, pressure and total energy per unit volume. This final 
quantity may be expressed as 

e + — {u 2 4- v 2 + w 2 ) 

2 v 

where the specific internal energy, e, is related to the pressure and density by 
the simple law of a calorically-perfect gas 

P = (7 ~ 1 )pe 

with 7 denoting the ratio of specific heats. The coefficient of thermal conduc- 
tivity, /c, and the viscosity coefficients, A and n, are assumed to be functions 
only of temperature. Furthermore, by invoking Stokes’ assumption of zero 
bulk viscosity, A may be expressed in terms of the dynamic viscosity n as 



Re and Pr denote the Reynolds and Prandtl numbers, respectively. 
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Although, for simplicity, the equations of motion are presented here written 
in Cartesian coordinates, it is well known that their strong conservation law 
form may be maintained under an arbitrary time-dependent transformation of 
coordinates. Explicit detail concerning the generalized-coordinate version of 
these equations, which is employed in the computations to be discussed subse- 
quently is available in [23]. 

Note that, in practice, the thin-layer assumption is implemented by using a 
body-fitted coordinate system and neglecting the viscous terms in the coordi- 
nate directions along the body. For Cartesian coordinates, with x and y 
representing the body-conforming coordinates, the thin-layer version of the 
Navier-Stokes equations is as given above. 

The effects of turbulence are simulated by means of a two-layer algebraic 
eddy viscosity model. In the stress terms of the Navier-Stokes equations, the 
coefficient of dynamic viscosity, fi, is replaced by where is the 

coefficient of eddy viscosity. Similarly, in the heat flux terms, the coefficient 
of thermal conductivity, k, is replaced by /c+ where Pr^. is the tur- 

bulent Prandtl number. The eddy viscosity is determined by the method of 
Baldwin and Lomax [24] . 

SOLUTION METHODOLOGY 

In order to minimize the cost of simulating complex, three-dimensional 
viscous flow over complete configurations, the following strategy has been 
developed: 

a. Use a robust and flexible explicit flow solver capable of simulating either 
steady or unsteady flow with the Euler, thin-layer Navier-Stokes or full 
Navier-Stokes equations. 

b. Distribute grid points optimally by making use of locally-embedded grid 
refinements. 

c. Make use of a flow simulation hierarchy ranging from the Euler equations 
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through the full Navier-Stokes equations in order to minimize the computa- 
tional work per grid point. 

d. Accelerate the convergence of steady flow simulations by means of an expli- 
cit multiple-grid technique which may be applied, without modification, to 
the entire hierarchy of model equations. 

e. Take advantage of the explicit nature of the algorithm by mapping it onto a 
supercomputer architecture consisting of multiple vector-processing CPU’s 
and thus enhance its performance by means of both vectorization and mul- 
titasking. 

Further detail concerning this strategy is provided below. 

Basic Flow Solver 

The integration scheme used here is the two-step Lax-Wendroff method 
due to MacCormack [25]. The forward predictor - backward corrector version 
of this method may be written as 





where: 


[(■^i+i,/,* ^ »',/,* ) + (Fi.j.k ^V-i, y,* )] 

[( G i!)+1.» - - G ., ;-!,»)] 

[(*.”,.*« *) +«,.*- K, .*-.)] 
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s *i,i,k = («(* + At) - 

9i,j,k ~ Qi,j,k + A «W 

Fi,j,k - F[Qi,j,k) G i,j,k = G(q it j ik ) = H {Vi ,j,k) 

First derivatives in the viscous terms are backward differenced in the predictor 
and forward differenced in the corrector. 

This version of MacCormack’s scheme is used here for convenience. Any 
of its many variants could also be used, as could any other one- or two-step 
Lax-Wendroff scheme [26]. In fact, the class of fine-grid methods with which 
the convergence acceleration technique described below may be applied appears 
to be quite large, including schemes not of Lax-Wendroff type [27]. 

The advantages of MacCormack’s method, in the present context, are its 
explicit nature, simplicity and low operations count. A disadvantage is its con- 
ditional stability and the severe time-step size limitation which this imposes for 
viscous flows, in particular. The ill effects of conditional stability are mitigated 
through the use of embedded grid refinements and convergence acceleration. 

Embedded Meshes 

The embedded-mesh technique developed for the present application is a 
generalization of that employed in [19] to obtain two-dimensional Euler solu- 
tions. The computational domain is divided into regions requiring grids of 
differing fineness and the resolution of different flow physics. At present, for 
simplicity, this partitioning is done a-priori. However, solution-adaptive grid- 
ding based on this technique is possible. Figures 1 through 5 show typical loca- 
tions for the mesh regions employed in the computations described subse- 
quently in this paper. Note that, where mesh lines are illustrated, their spacing 
is much coarser than that employed in the computations. This has been done 
for clarity of illustration. Figure 1 shows mesh 3, in white. This mesh covers 
the entire computational domain. Figure 2 overlays, in yellow, the regions col- 
lectively referred to as mesh 2. Figure 3 shows, in cyan, the mesh-1 regions 
projected onto mesh 3. In Figure 4, meshes 1 through 3 are assembled into a 
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single illustration. Figure 5 simply adds wing surface grid lines to this assem- 
blage. The mesh-1 regions contain the finest grids. Mesh-2 regions are coarser 
by a factor of 2 in each dimension. Mesh 3 is coarser again by a factor of 2 in 
each dimension. From this example, it is easy to see that quite general collec- 
tions of embedded meshes may be constructed in this manner. The embedded 
meshes are not disjoint. Rather, given a mesh labelled m, all coarser meshes 
from m+ 1 through the coarsest mesh used in the computation underlie it. 
This property, together with the coarsening factor of 2, facilitate the use of the 
multiple-grid convergence technique described below. 

Points on a boundary between two mesh regions belong to the coarser 
region. Where mesh regions reach the boundaries of the computational 
domain, boundary conditions are applied consistent with the finest mesh reach- 
ing the boundary segment in question. The flowfield updating begins with 
mesh 1. After one timestep on mesh 1, mesh 2 is updated exterior to mesh 1 
while convergence acceleration is applied at the mesh-2 points interior to mesh 
1. Next, mesh 3 is updated exterior to mesh 2 while convergence acceleration 
is applied at the mesh-3 points interior to mesh 2. Updating proceeds in this 
fashion until the global mesh has been advanced by one timestep. Then the 
updating cycle is completed by applying convergence acceleration to coarsenings 
of the global grid. This cycle is repeated until the desired measure of conver- 
gence is satisfied. 

Simulation Hierarchy 

In order to minimize the computational work to obtain a flowfield solution 
of specified accuracy and physical resolution, different approximations to the 
Navier-Stokes equations are vised in different mesh regions. In the example 
illustrated in Figures 1 through 5, the full Navier-Stokes equations are solved 
on mesh 1, the thin-layer Navier-Stokes equations are solved on mesh 2 and 
the Euler equations are solved on mesh 3. In this way the physics contained in 
the model equations and the resolving power of the various meshes are 
matched to each other. 
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Convergence Acceleration 


Given the fine-grid corrections, 6q, we wish to use successively coarser 
grids to propagate this information throughout the computational domain, thus 
accelerating convergence to the steady state while maintaining the accuracy 
determined by the fine-grid discretization. Given a basic fine grid with the 
number of points in each direction expressible as n(2 p ) + 1 for p and n integers 
such that p ^ 0 and n ^ 2, where p is the number of grid coarsenings and n is 
the number of coarsest-grid intervals, let successively coarser grids be defined 
by successive deletion of every other point in each coordinate direction. 

Although it is quite probable that a large variety of coarse-grid acceleration 
schemes may be constructed, we limit our attention here to those explicit 
schemes based on Lax-Wendroff methods. Such an acceleration scheme may 
be expressed as 

, _ A . . At 2 

^ ^coarse At Qt ^ Qtt 

By introducing Eqn.(l), this may be rewritten as 

At 2 

oane = z + G y + H z) ^ — (^x + G y + H z)t 

Observing that 

A q = -A t(F z +G y + H z ) 

we obtain 

At 2 

= A? - —if, + G, + «,)i (*) 

Various acceleration schemes may now be derived, according to the way in 
which the second term on the right-hand side of Eqn.(3) is treated. 
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If we let 


~{F X + G y + H z ) t - [A{F X + G y + H z )] x + [B(F Z + G y + H z )\ y 
+ [<?(f, + <?, + *,)], 

where A, B and C are the Jacobian matrices 

A = dFJdq , B = dG foq and C = dH /dq 

we obtain the class of Jacobian-based acceleration schemes, of which the 
method due to Ni [28] is a member. 

If, on the other hand, we let 

(F, + C, + a.), » -i-[ (F, + G, + H,r*' - (F, + o, + *,)■] 


where 

F n - F(q n ) , G n = G{q n ) , H n = H{q n ) 


F n+1 = F[q n + Aq) , G n+1 = G{q n + Aq) , H n+l = H (q n + Aq) 


we obtain the class of flux-based schemes introduced in [29] . 


In both classes of schemes, Aq is approximated by a restriction of the fine- 
grid value of £q and second-order accurate spatial differencing is used. For 
example, a simple Jacobian-based acceleration scheme may be written as 


o S 2 2 

° /=± 1 m = ± 1 n=± 1 


/-/ -%La -m -^~B -n -%Lc 
Az Ay Az 


Aq 


J i+l,j+m,k +i» 


This scheme is contrasted in Fig. 6 with its one-step Lax-Wendroff analog, writ- 
ten on the fine grid. That one-step scheme may be written as 
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^9 o S 2 S 

0 /=± 1 ro=± i n=± X 


T . At . At _ At _ 

I —I A — m — — B —n C 

Az A y Az 


Aq 


9 . I • * W * fl 

•+ T . i+ T> *+ T 


where Aq is not approximated as a restriction of some <Sq, as in the coarse-grid 
scheme, but is rather computed as 




At 
4 Ax 


[f-F. +!,;,* + ^i+l,j+l,k "h +1,/,* +1 + ^V-H, ;’+!,* +lj 


+ + F i,i,k + 1 + ^,y+i,*+i) 


At 
4 Ay 


[( G «',y+b* + G i+i,y+i,* + G i,i+i,k+i + G »+i,y+i,*+i) 


G «+i,y,* + G «,y,*+i + G »+i,y,*+i) 


At 

4Az 


[(•^•.y.ik+l ^i+l,j,k +1 ^i,j+l,k+l •®i+l > /+l > t +l| 


[ H iJ,k + H i+l,j,k + H i,)+l,k + ^»+l,/+l,*j 


All results reported in this paper use flux-based convergence acceleration. 

In the sequential grid updating algorithm, illustrated in Fig. 7a, the solution 
is advanced over one multiple-grid cycle as follows. First a fine-grid correction, 
5q^, is computed. Then £q^ is restricted to the next-coarser grid, where Sq^ is 
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computed. The 6 q 2 correction is both restricted to grid 3 and prolonged to grid 
l, where it provides an additional update to the fine-grid solution. On grids 3 
through N-l the procedure is analogous to that on grid 2. When 6q M has been 
computed and prolonged to grid 1 to provide the N wl update to the fine-grid 
solution, the next multiple-grid cycle is ready to begin. 

Observe that when the components of the sequential grid updating algo- 
rithm (namely, the fine- and coarse-grid schemes) are both explicit, it is partic- 
ularly easy to vectorize. However, the effectiveness of vectorizing the coarse- 
grid scheme is limited by the progressively shorter vectors which may be con- 
structed on the successively coarser grids. Such an explicit sequential algorithm 
may also be run on a Multiple Instruction-Multiple Data (MIMD) machine by 
splitting each grid, in turn, across the total number of processors available. An 
implicit sequential grid updating algorithm would probably not vectorize as well 
and also require additional redesign to run on a parallel processor. 

The parallel coarse-grid algorithm, illustrated in Fig. 7b, removes the 
dependence of grids 3 through N upon their immediate predecessors. In partic- 
ular, (Sqj is now restricted to each of grids 2 through N. All of these coarse 
grids may then be updated simultaneously and independently of each other. 
This allows the mesh points on grids 2 through N to be assembled into one 
vector in order to improve performance on a Single Instruction-Multiple Data 
(SIMD) computer. Alternatively, the coarse grids could each be updated 
simultaneously on separate processors of an MIMD machine. This would be 
attractive, for example, if the coarse-grid scheme were implicit. 

A further possibility is the fully parallel algorithm, illustrated in Fig. 7c. 
Here tfq^ from the previous cycle is restricted to each of the coarse grids. This 
makes all of the grids 1 through N independent of each other and allows their 
simultaneous update. 

Observe that dissipative effects have a local character and their influence 
need not be taken into account in the construction of acceleration schemes. 
Rather, it is the convective terms, with their global character, which are the key 
element in coarse-grid propagation. Hence, acceleration schemes for viscous 


13 


flow computations may be formulated on the basis of the inviscid equations of 
motion- Such a scheme leads to a convergence acceleration procedure which is 
independent of the nature of the dissipative terms retained in the viscous 
model equations. That is to say: a scheme based on the Euler equations may 
be employed, without modification, to accelerate the convergence of viscous 
flow computations based on the Navier-Stokes equations, the thin-layer equa- 
tions, or any other viscous model equations which contain the full inviscid 
Euler equations. 

Boundary conditions are not updated during convergence acceleration. This 
has the advantage of decoupling the acceleration scheme from both the physical 
and numerical nature of these boundary conditions. That is to say: the 
acceleration scheme always sees a Dirichlet problem. Any numerical damping 
terms which may be necessary are also omitted during convergence accelera- 
tion. This enhances the modularity of the acceleration scheme. 

For all of the results to be discussed subsequently, linear interpolation has 
been used as the prolongation operator. For the sequential grid updating algo- 
rithm, injection is used as the restriction operator. The parallel coarse-grid 
algorithm introduces averaging into the restriction operator for grids 3 through 
N. The fully parallel algorithm further employs underrelaxation of the fine-grid 
information restricted from the previous cycle. 

Finally, note that, while in this paper multiple-grid convergence acceleration 
is applied only to steady flow simulations, it appears that the technique may 
extend to the time-accurate computation of some unsteady flows [30, 31]. 

Multitasking 

When attempting to multitask an algorithm for execution on an MIMD 
machine, we are concerned with multitasking overhead and algorithm granular- 
ity. By granularity we mean the time required to execute a multitaskable seg- 
ment of the algorithm on a single processor [32]. For a given multitasking 
overhead, the best speedup is obtained when algorithm granularity is maximal. 
Large granularity is usually introduced by top-down programming which 


14 



exploits global parallelism in the algorithm. Bottom-up programming, on the 
other hand, exploits algorithm parallelism at a low level by making many parti- 
tionings, each on small code segments, such as DO loops containing indepen- 
dent statements [33] . 

The sequential multigrid algorithm contains many opportunities for creating 
small granularity parallelism but relatively few opportunities for the sort of 
large granularity necessary to produce good multitasking speedup in the face of 
non-trivial multitasking overhead. This observation, together with the desira- 
bility of non-sequential multigrid schemes for reasons of algorithm flexibility, 
led to the construction of the parallel multigrid algorithms described above. In 
these algorithms, grids which are independent of one another may be updated 
simultaneously on separate processors. In fact, such a simple strategy may 
result in a poor load balance across processors because of the differing amounts 
of work inherent in updating grids of different coarseness. However, more 
refined strategies are possible. Grids may, for example, be grouped together 
into tasks of approximately equal work, or they may be melded into tasks with 
other large-grained multitaskable code segments in order to equilibrate proces- 
sor loading. Notice further that, by multitasking large-grained structures, the 
vectorization potential of code within these structures remains intact. 

NUMERICAL SIMULATION 

At present, all of the elements discussed in the section on solution metho- 
dology have been encoded and computationally verified in either two or three 
dimensions, or both. These elements are now being assembled into a 
comprehensive three-dimensional algorithm. Performance evaluation of this 
algorithm will be completed in March 1986 and reported in the final version of 
this paper. For this abstract, although the three-dimensional model problem 
designed to exercise all aspects of the completed algorithm is described, only 
interim results are discussed. The expected variance between interim and final 
results is indicated, where possible. 
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Physical Problem 


As the algorithm described in this paper is designed to efficiently simulate 
complex flows over complete configurations, it should be tested under condi- 
tions which fully exercise its capabilities. On the other hand, excessive com- 
plexity would serve no useful purpose in the initial testing phases of the algo- 
rithm. With these considerations in mind, three-dimensional computations are 
being carried out for the geometry illustrated in Fig. 8, a rectilinear cascade of 
finite-span, swept blades mounted between endwalls. The sweep angle ranges 
from 0 to 26 degrees. The blade thickness to chord ratio ranges from 0.0 to 
0.2. The subcritical computations are performed at an isentropic inlet Mach 
number of 0.5. The Mach number for the supercritical computations is 0.675. 

In the viscous cases, the Reynolds numbers, based on cascade gap and critical 

3 5 

speed, span the approximate range from 8.4 x 10 to 2.0 x 10 . 

At the upstream domain boundary, total pressure, total temperature and 
flow angle are specified. At the downstream boundary, the static pressure is 
fixed. Along inviscid lateral boundaries, the tangency condition is applied, 
while, along solid walls, the no-slip condition is applied and the temperature 
specified. Symmetry and periodicity are invoked to limit the size of the compu- 
tational domain. Uniform flow at the isentropic inlet Mach number is used as 
an initial state. 

Algorithm Implementation 

The mesh structure on which the computations axe performed is illustrated 
in Figs. 1 through 5. The full Navier-Stokes equations are solved on mesh 1. 
The thin-layer Navier-Stokes equations are solved on mesh 2. The Euler equa- 
tions are solved on mesh 3. Only steady flows are computed and convergence 
acceleration, as described previously, is applied. The entire algorithm is vector- 
ized and multitasked to run on a four-processor Cray X-MP or Cray 2. 

Computational Results 

Sample flowfield results obtained using the basic flow solver with 
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convergence acceleration but without embedded mesh refinements are shown 
in Figs. 9 through 12. The use of embedded meshes causes no deterioration of 
these results, assuming that the meshes are positioned properly. The final ver- 
sion of this paper will contain much more elaborate flowfield results, including 
depiction of corner vortices. 

Comparison of the embedded-mesh algorithm with a single-mesh algorithm 
yields the following general result: the accuracy of the embedded-mesh results 
is essentially that of a global finest mesh, while the convergence rate is like that 
of a global coarsest mesh. This observation is consistent with [19] . Thus far, 
in two-dimensional computations using the Euler and thin-layer Navier-Stokes 
equations and three mesh regions, embedding speedups as high as 30 in com- 
parison to a single-mesh algorithm have been obtained. Some interim two- 
dimensional embedding speedups are reported in Table I. Three-dimensional 
embeddings using Euler, thin-layer and full Navier-Stokes regions should pro- 
duce substantially larger speedups. 

Multiple-grid convergence acceleration applied to three-dimensional cases, 
in the absence of mesh embedding, has yielded speedups ranging from 2.5 to 
4.7 (see Table II). It is expected that there will be some tradeoff between 
embedding speedup and multigrid speedup in the complete algorithm. 

Vectorization of the three-dimensional algorithm without embedded meshes 
results in speedups ranging from 3.6 to 5.7 (see Table III). This range should 
remain about the same in the final algorithm. 

Using a top-down multitasking approach, the parallel coarse-grid algorithm 
has been implemented on a four processor Cray X-MP, for two-dimensional 
cases without mesh embedding. Initially, only the coarse grids were multi- 
tasked so that the performance of parallel grids on a multiprocessor could be 
evaluated. Then the fine-grid computations were partitioned and multitasked, 
and the resultant code was integrated with the parallelized coarse grids. Load 
balancing of the entire scheme completed the study of performance resulting 
from the top-down approach. 
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Multitasking results are shown in Table IV. The performance measures are 
based on a comparison of multitasked code segments with their unitasked ana- 
logs. The parallel coarse-grid scheme results contained in Table IVa were 
obtained with a five-grid multigrid sequence length. Observe that an efficiency 
of nearly 90% has been obtained using two processors, but that efficiency 
deteriorates to 77% when four processors are used. This deterioration is a 
result of distributing multigrid structures containing unequal amounts of work 
across four processors, which creates a less than ideal load balance. Table IVb 
shows results obtained from multitasking the fine-grid scheme. The fine-grid 
tasks are fairly evenly balanced, and this code segment performs well on both 
two and four processors. Processor utilization of 90% or better is achieved. 
When the four-processor case is recomputed using the low-overhead version of 
multitasking known as microtasking [34], efficiency is improved to 95%. The 
fully multitasked multigrid algorithm performance is shown in Table IVc. On 
two processors, 94% efficiency is obtained, while on four processors, with 
speedup by a factor of 3.3 over the unitasked code, efficiency is 83%. Multi- 
tasking speedups are expected to improve for the complete three-dimensional 
code. 

Given that the speedups from the various categories described above are 
generally multiplicative in effect, it is to be conservatively estimated that the 
completed algorithm to be reported in the final version of this paper will pro- 
duce overall speedups ranging as high as 100. 

CONCLUSIONS 

An efficient algorithm designed to be used for Navier-Stokes simulations of 
complex flows over complete configurations has been presented. 

The algorithm makes use of several elements: 

A robust explicit basic flow solver 

Locally-embedded mesh refinements 

A flow simulation hierarchy ranging from the Euler 
equations through the full Navier-Stokes equations 
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An explicit multiple-grid convergence acceleration 
technique 

Both vectorization and multitasking for efficient 
execution on parallel-processing supercomputers 

Results are presented for a problem representative of turbomachinery appli- 
cations. These results validate the algorithm and provide grounds for optimism 
regarding its future application to more challenging internal and external flows. 

Based on the interim performance data available at this writing, it is 
estimated that the completed algorithm to be reported in the final version of 
this paper will produce overall speedups ranging as high as 100. 
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TABLES 


Table I. Interim Two-Dimensional Embedding Speedups 


Inviscid Subcritical 

16.4 

Inviscid Supercritical 

6.1 

Turbulent Viscous 

30.2 


Table II. Interim Three-Dimensional Multigrid Speedups 


Inviscid Subcritical 4.7 

Inviscid Supercritical 2.5 

Turbulent Viscous 4.4 


Table III. Interim Three-Dimensional Vectorization Speedups 



Scalar 

Automatic 

Vectorization 

Explicit 

Vectorization 

Inviscid Subcritical 

1.0 

4.2 

5.7 

Inviscid Supercritical 

1.0 

3.1 

4.9 

Turbulent Viscous 

1.0 

2.6 

3.6 
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Table IVa. Interim Two- Dimensional Multitasked Coarse- Grid Performance 


Machine 

2 Processors 

4 Processors 

Speedup 

Efficiency 

Speedup 


Cray X-MP 

1.78 

0.89 

3.06 

0.77 


Table IVb. Interim Two-Dimensional Multitasked Fine- Grid Performance 


Machine 

2 Processors 

4 Processors 

Speedup 

Efficiency 

Speedup 

Efficiency 

Cray X-MP 

1.91 

0.96 

3.58 

0.90 

Cray X-MP 
with 

micro tasking 

1.93 

0.97 

3.78 

0.95 


Table IVc. Interim Two-Dimensional Multitasked Complete Scheme Performance 


Machine 

2 Processors 



4 Processors 

Speedup 

Efficiency 

Speedup 

Efficiency 

Cray X-MP 

1.87 

0.94 

3.30 

0.83 
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FIGURE 1. Mesh 3 FIGURE 2. Mesh 2, in Yellow 
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FIGURE 3. Mesh 1, in Cyan, FIGURE k. Meshes 1 through 

Overlain on Mesh 3 
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FIGURE 5. Meshes 1 through 
wi th Wing Surface 
Grid Lines Shown 
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FIGURE 6. Comparison of Fine- and Coarse-Gr 
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FIGURE 8. Computational Domain for Three-Dimensional Cascade 
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FIGURE 9- Subcritical Inviscid Flow, 26 Deg. Sweep 
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FIGURE 10. Supercritical Inviscid Flow, 26 Deg. Sweep 
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